{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Classification Tutorial\n",
    "\n",
    "This tutorial will show how to use Tribuo's classification models to predict Iris species using Fisher's well known Irises dataset (it's 2020 and we're still using a dataset from 1936 in demos, but not to worry we'll use MNIST from the 90s next time). We'll focus on a simple logistic regression, and investigate the provenance and metadata that Tribuo stores inside each model.\n",
    "\n",
    "## Setup\n",
    "You'll need to get a copy of the irises dataset.\n",
    "\n",
    "`wget https://archive.ics.uci.edu/ml/machine-learning-databases/iris/bezdekIris.data`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's Java, so first we load in the necessary Tribuo jars. Here we're using the classification experiments jar, along with the json interop jar to read and write the provenance information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%jars ./tribuo-classification-experiments-4.3.0-jar-with-dependencies.jar\n",
    "%jars ./tribuo-json-4.3.0-jar-with-dependencies.jar"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import java.nio.file.Paths;\n",
    "import java.nio.file.Path;\n",
    "import java.nio.file.Files;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We import everything from the base org.tribuo package, along with the simple CSV loader, and the classification packages. We're going to build a logistic regression, so we'll need that too."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import org.tribuo.*;\n",
    "import org.tribuo.evaluation.TrainTestSplitter;\n",
    "import org.tribuo.data.csv.CSVLoader;\n",
    "import org.tribuo.classification.*;\n",
    "import org.tribuo.classification.evaluation.*;\n",
    "import org.tribuo.classification.sgd.linear.LogisticRegressionTrainer;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These imports are for the provenance system, which we'll get to in a minute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import com.fasterxml.jackson.databind.*;\n",
    "import com.oracle.labs.mlrg.olcut.provenance.ProvenanceUtil;\n",
    "import com.oracle.labs.mlrg.olcut.config.json.*;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading the data\n",
    "In Tribuo, all the prediction types have an associated `OutputFactory` implementation, which can create the appropriate `Output` subclasses from an input. Here we're going to use `LabelFactory` as we're performing multi-class classification. We then pass the `labelFactory` into the simple `CSVLoader` which reads all the columns into a `DataSource`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "var labelFactory = new LabelFactory();\n",
    "var csvLoader = new CSVLoader<>(labelFactory);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our copy of irises doesn't have any column headers, so we create the headers and supply them to the load method along with the path, and which variable is the output (in this case \\\"species\\\"). Irises doesn't have a pre-defined train/test split, so we're going to create one, with 70% of the data used for training. Note if your csv file is more complicated than a table of numbers and a response column then you should use `CSVDataSource` to load it in, and you might want to read the columnar data tutorial too."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "var irisHeaders = new String[]{\"sepalLength\", \"sepalWidth\", \"petalLength\", \"petalWidth\", \"species\"};\n",
    "var irisesSource = csvLoader.loadDataSource(Paths.get(\"bezdekIris.data\"),\"species\",irisHeaders);\n",
    "var irisSplitter = new TrainTestSplitter<>(irisesSource,0.7,1L);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We feed the training datasource and the test datasource into their respective datasets. These datasets compute all the necessary metadata, like the feature domain and the output domain. For training datasets it's best to use a `MutableDataset` as it can have transformations applied to it, and the domains grow as more examples are added. Now we have datasets we're ready to train some models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training data size = 105, number of features = 4, number of classes = 3\n",
      "Testing data size = 45, number of features = 4, number of classes = 3\n"
     ]
    }
   ],
   "source": [
    "var trainingDataset = new MutableDataset<>(irisSplitter.getTrain());\n",
    "var testingDataset = new MutableDataset<>(irisSplitter.getTest());\n",
    "System.out.println(String.format(\"Training data size = %d, number of features = %d, number of classes = %d\",trainingDataset.size(),trainingDataset.getFeatureMap().size(),trainingDataset.getOutputInfo().size()));\n",
    "System.out.println(String.format(\"Testing data size = %d, number of features = %d, number of classes = %d\",testingDataset.size(),testingDataset.getFeatureMap().size(),testingDataset.getOutputInfo().size()));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training the model\n",
    "Now let's instantiate the trainer, and see what it's default hyperparameters are. For full control over these parameters you can directly use `LinearSGDTrainer` which is fully configurable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "LinearSGDTrainer(objective=LogMulticlass,optimiser=AdaGrad(initialLearningRate=1.0,epsilon=0.1,initialValue=0.0),epochs=5,minibatchSize=1,seed=12345)\n"
     ]
    }
   ],
   "source": [
    "Trainer<Label> trainer = new LogisticRegressionTrainer();\n",
    "System.out.println(trainer.toString());"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So that's a linear model, using a logistic loss, trained with `AdaGrad` for 5 epochs.\n",
    "\n",
    "Now let's train the model. As with other packages, training is pretty simple when you have the training algorithm and training data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "Model<Label> irisModel = trainer.train(trainingDataset);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluating the model\n",
    "Once we've trained a model, it's time to figure out how good it is. For this can we ask the `labelFactory` what the appropriate `Evaluator` is or instantiate it directly to get a sharper type, then pass the evaluator the model and the test dataset. You can also supply a datasource instead of the dataset, or even a list of predictions if you've already generated them. The `LabelEvaluator` class implements most of the common classification metrics, each of which can be individually inspected. These metrics can be inspected on a per label basis or averaged across all the possible labels. `LabelEvaluator.toString()` produces a nicely formatted summary of the metrics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Class                           n          tp          fn          fp      recall        prec          f1\n",
      "Iris-versicolor                16          16           0           1       1.000       0.941       0.970\n",
      "Iris-virginica                 15          14           1           0       0.933       1.000       0.966\n",
      "Iris-setosa                    14          14           0           0       1.000       1.000       1.000\n",
      "Total                          45          44           1           1\n",
      "Accuracy                                                                    0.978\n",
      "Micro Average                                                               0.978       0.978       0.978\n",
      "Macro Average                                                               0.978       0.980       0.978\n",
      "Balanced Error Rate                                                         0.022\n"
     ]
    }
   ],
   "source": [
    "var evaluator = new LabelEvaluator();\n",
    "var evaluation = evaluator.evaluate(irisModel,testingDataset);\n",
    "System.out.println(evaluation.toString());"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This output lists:\n",
    "- the different classes in the test set\n",
    "- n, the number of ground truth labels of that class\n",
    "- tp, the number of true positives (i.e., the number of times the classifier correctly predicted this class)\n",
    "- fn, the number of false negatives (i.e., the number of times the classifier predicted this class as another class)\n",
    "- fp, the number of false positives (i.e., the number of times the classifier incorrectly predicted this class when it was another class)\n",
    "- recall, the true positives divided by the number of ground truth labels (i.e., the fraction of this class that the classifier can detect)\n",
    "- precision, the true positives divided by the predicted positives (i.e, the fraction of the time that this class is predicted correctly)\n",
    "- accuracy, the sum of the true positives divided by the total number of test examples\n",
    "- balanced error rate, the average of the per class error rates\n",
    "\n",
    "For probabilistic classifiers it's also possible to compute the [ROC curve](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) and precision-recall curve.\n",
    "\n",
    "We can also print the confusion matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                   Iris-versicolor   Iris-virginica      Iris-setosa\n",
      "Iris-versicolor                 16                0                0\n",
      "Iris-virginica                   1               14                0\n",
      "Iris-setosa                      0                0               14\n",
      "\n"
     ]
    }
   ],
   "source": [
    "System.out.println(evaluation.getConfusionMatrix().toString());"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see there is a single misclassification, where an `Iris-virginica` is misclassified as an `Iris-versicolor`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Metadata\n",
    "Tribuo tracks the feature and output domains of all constructed models. This means it's possible to run techniques like [LIME](https://dl.acm.org/doi/10.1145/2939672.2939778) without access to the original training data, and also to add checks that a particular input is within the bounds seen by the trained model.\n",
    "\n",
    "Let's look at the feature domain from our Irises model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CategoricalFeature(name=petalLength,id=0,count=105,map={1.2=1, 6.9=1, 3.6=1, 3.0=1, 1.7=4, 4.9=4, 4.4=3, 3.5=2, 5.9=2, 5.4=1, 4.0=4, 1.4=12, 4.5=4, 5.0=2, 5.5=3, 6.7=2, 3.7=1, 1.9=1, 6.0=2, 5.2=1, 5.7=2, 4.2=2, 4.7=2, 4.8=4, 1.6=4, 5.8=2, 3.8=1, 6.3=1, 3.3=1, 1.0=1, 5.6=4, 5.1=5, 4.6=3, 4.1=2, 1.5=9, 1.3=4, 3.9=3, 6.6=1, 6.1=2})\n",
      "\n",
      "CategoricalFeature(name=petalWidth,id=1,count=105,map={2.0=3, 0.5=1, 1.2=3, 0.3=6, 1.6=2, 0.1=3, 0.4=5, 2.5=3, 2.3=4, 1.7=2, 1.1=3, 2.1=4, 0.6=1, 1.4=6, 1.0=5, 2.4=1, 1.8=12, 0.2=20, 1.9=4, 1.5=7, 1.3=8, 2.2=2})\n",
      "\n",
      "CategoricalFeature(name=sepalLength,id=2,count=105,map={6.9=3, 6.4=3, 7.4=1, 4.9=4, 4.4=1, 5.9=3, 5.4=5, 7.2=3, 7.7=3, 5.0=8, 6.2=2, 5.5=5, 6.7=7, 6.0=3, 5.2=2, 6.5=3, 5.7=4, 4.7=2, 4.8=3, 5.8=4, 5.3=1, 6.8=3, 6.3=5, 7.3=1, 5.6=6, 5.1=7, 4.6=4, 7.6=1, 7.1=1, 6.6=2, 6.1=5})\n",
      "\n",
      "CategoricalFeature(name=sepalWidth,id=3,count=105,map={2.0=1, 2.8=10, 3.6=4, 2.3=3, 2.5=5, 3.1=8, 3.8=4, 3.0=19, 2.6=4, 4.4=1, 3.3=4, 3.5=4, 2.4=2, 3.2=10, 2.9=5, 3.7=3, 3.4=6, 2.2=2, 3.9=2, 4.2=1, 2.7=7})\n",
      "\n"
     ]
    }
   ],
   "source": [
    "var featureMap = irisModel.getFeatureIDMap();\n",
    "for (var v : featureMap) {\n",
    "    System.out.println(v.toString());\n",
    "    System.out.println();\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see the 4 features, along with a histogram of their values. This information can be used to sample from each feature, to build candidate examples for local explainers like LIME, or to check the range. The feature information is frozen at model training time, so it can also be used to check the number of times a feature occurred in the training set, when the feature set is sparse (as is commonly the case in NLP problems)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Provenance\n",
    "\n",
    "Modern applications deploy many different kinds of ML models, helping with many different aspects of the application. However most ML packages don't provide good support for tracking and rebuilding models. In Tribuo each model tracks it's provenance. It knows how it was created, when it was created, and what data was involved. Let's look at the data provenance for our irises model. By default Tribuo prints the provenance in a moderately human readable format in each provenance object's `toString()`, but all the information is accessible programmatically."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TrainTestSplitter(\n",
      "\tclass-name = org.tribuo.evaluation.TrainTestSplitter\n",
      "\tsource = CSVDataSource(\n",
      "\t\t\tclass-name = org.tribuo.data.csv.CSVDataSource\n",
      "\t\t\theaders = List[\n",
      "\t\t\t\tsepalLength\n",
      "\t\t\t\tsepalWidth\n",
      "\t\t\t\tpetalLength\n",
      "\t\t\t\tpetalWidth\n",
      "\t\t\t\tspecies\n",
      "\t\t\t]\n",
      "\t\t\trowProcessor = RowProcessor(\n",
      "\t\t\t\t\tclass-name = org.tribuo.data.columnar.RowProcessor\n",
      "\t\t\t\t\tmetadataExtractors = List[]\n",
      "\t\t\t\t\tfieldProcessorList = List[\n",
      "\t\t\t\t\t\tDoubleFieldProcessor(\n",
      "\t\t\t\t\t\t\t\t\tclass-name = org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\n",
      "\t\t\t\t\t\t\t\t\tfieldName = petalLength\n",
      "\t\t\t\t\t\t\t\t\tonlyFieldName = true\n",
      "\t\t\t\t\t\t\t\t\tthrowOnInvalid = true\n",
      "\t\t\t\t\t\t\t\t\thost-short-name = FieldProcessor\n",
      "\t\t\t\t\t\t\t\t)\n",
      "\t\t\t\t\t\tDoubleFieldProcessor(\n",
      "\t\t\t\t\t\t\t\t\tclass-name = org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\n",
      "\t\t\t\t\t\t\t\t\tfieldName = petalWidth\n",
      "\t\t\t\t\t\t\t\t\tonlyFieldName = true\n",
      "\t\t\t\t\t\t\t\t\tthrowOnInvalid = true\n",
      "\t\t\t\t\t\t\t\t\thost-short-name = FieldProcessor\n",
      "\t\t\t\t\t\t\t\t)\n",
      "\t\t\t\t\t\tDoubleFieldProcessor(\n",
      "\t\t\t\t\t\t\t\t\tclass-name = org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\n",
      "\t\t\t\t\t\t\t\t\tfieldName = sepalWidth\n",
      "\t\t\t\t\t\t\t\t\tonlyFieldName = true\n",
      "\t\t\t\t\t\t\t\t\tthrowOnInvalid = true\n",
      "\t\t\t\t\t\t\t\t\thost-short-name = FieldProcessor\n",
      "\t\t\t\t\t\t\t\t)\n",
      "\t\t\t\t\t\tDoubleFieldProcessor(\n",
      "\t\t\t\t\t\t\t\t\tclass-name = org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\n",
      "\t\t\t\t\t\t\t\t\tfieldName = sepalLength\n",
      "\t\t\t\t\t\t\t\t\tonlyFieldName = true\n",
      "\t\t\t\t\t\t\t\t\tthrowOnInvalid = true\n",
      "\t\t\t\t\t\t\t\t\thost-short-name = FieldProcessor\n",
      "\t\t\t\t\t\t\t\t)\n",
      "\t\t\t\t\t]\n",
      "\t\t\t\t\tfeatureProcessors = List[]\n",
      "\t\t\t\t\tresponseProcessor = FieldResponseProcessor(\n",
      "\t\t\t\t\t\t\tclass-name = org.tribuo.data.columnar.processors.response.FieldResponseProcessor\n",
      "\t\t\t\t\t\t\tuppercase = false\n",
      "\t\t\t\t\t\t\tfieldNames = List[\n",
      "\t\t\t\t\t\t\t\tspecies\n",
      "\t\t\t\t\t\t\t]\n",
      "\t\t\t\t\t\t\tdefaultValues = List[\n",
      "\t\t\t\t\t\t\t\t\n",
      "\t\t\t\t\t\t\t]\n",
      "\t\t\t\t\t\t\tdisplayField = false\n",
      "\t\t\t\t\t\t\toutputFactory = LabelFactory(\n",
      "\t\t\t\t\t\t\t\t\tclass-name = org.tribuo.classification.LabelFactory\n",
      "\t\t\t\t\t\t\t\t)\n",
      "\t\t\t\t\t\t\thost-short-name = ResponseProcessor\n",
      "\t\t\t\t\t\t)\n",
      "\t\t\t\t\tweightExtractor = FieldExtractor(\n",
      "\t\t\t\t\t\t\tclass-name = org.tribuo.data.columnar.FieldExtractor\n",
      "\t\t\t\t\t\t)\n",
      "\t\t\t\t\treplaceNewlinesWithSpaces = true\n",
      "\t\t\t\t\tregexMappingProcessors = Map{}\n",
      "\t\t\t\t\thost-short-name = RowProcessor\n",
      "\t\t\t\t)\n",
      "\t\t\tquote = \"\n",
      "\t\t\toutputRequired = true\n",
      "\t\t\toutputFactory = LabelFactory(\n",
      "\t\t\t\t\tclass-name = org.tribuo.classification.LabelFactory\n",
      "\t\t\t\t)\n",
      "\t\t\tseparator = ,\n",
      "\t\t\tdataPath = /local/ExternalRepositories/tribuo/tutorials/bezdekIris.data\n",
      "\t\t\tresource-hash = 0FED2A99DB77EC533A62DC66894D3EC6DF3B58B6A8F3CF4A6B47E4086B7F97DC\n",
      "\t\t\tfile-modified-time = 1999-12-14T15:12:39-05:00\n",
      "\t\t\tdatasource-creation-time = 2022-10-07T11:20:06.279351-04:00\n",
      "\t\t\thost-short-name = DataSource\n",
      "\t\t)\n",
      "\ttrain-proportion = 0.7\n",
      "\tseed = 1\n",
      "\tsize = 150\n",
      "\tis-train = true\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "var provenance = irisModel.getProvenance();\n",
    "System.out.println(ProvenanceUtil.formattedProvenanceString(provenance.getDatasetProvenance().getSourceProvenance()));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see the model was trained on a datasource which was split in two, using a specific random seed & split percentage. The original datasource was a CSV file, and the file modified time and SHA-256 hash are recorded too. As of Tribuo v4.2 `CSVLoader` now generates a `CSVDataSource` allowing simpler migration to more complex columnar processing than the old method, along with producing more accurate provenance information suitable for automatic reproduction of models.\n",
    "\n",
    "We can similarly inspect the trainer provenance to find out about the training algorithm."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "LogisticRegressionTrainer(\n",
      "\tclass-name = org.tribuo.classification.sgd.linear.LogisticRegressionTrainer\n",
      "\tseed = 12345\n",
      "\tminibatchSize = 1\n",
      "\tshuffle = true\n",
      "\tepochs = 5\n",
      "\toptimiser = AdaGrad(\n",
      "\t\t\tclass-name = org.tribuo.math.optimisers.AdaGrad\n",
      "\t\t\tepsilon = 0.1\n",
      "\t\t\tinitialLearningRate = 1.0\n",
      "\t\t\tinitialValue = 0.0\n",
      "\t\t\thost-short-name = StochasticGradientOptimiser\n",
      "\t\t)\n",
      "\tloggingInterval = 1000\n",
      "\tobjective = LogMulticlass(\n",
      "\t\t\tclass-name = org.tribuo.classification.sgd.objectives.LogMulticlass\n",
      "\t\t\thost-short-name = LabelObjective\n",
      "\t\t)\n",
      "\ttribuo-version = 4.3.0\n",
      "\ttrain-invocation-count = 0\n",
      "\tis-sequence = false\n",
      "\thost-short-name = Trainer\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "System.out.println(ProvenanceUtil.formattedProvenanceString(provenance.getTrainerProvenance()));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we see as expected that our model was trained using a `LogisticRegressionTrainer` which used `AdaGrad` as the gradient descent algorithm.\n",
    "\n",
    "Provenance can be extracted from models and stored as json files, if you wish to keep a separate record (or redact the provenance from a deployed model)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "ObjectMapper objMapper = new ObjectMapper();\n",
    "objMapper.registerModule(new JsonProvenanceModule());\n",
    "objMapper = objMapper.enable(SerializationFeature.INDENT_OUTPUT);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The json provenance is verbose, but provides an alternative human readable serialization format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ {\n",
      "  \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ObjectMarshalledProvenance\",\n",
      "  \"object-name\" : \"linearsgdmodel-0\",\n",
      "  \"object-class-name\" : \"org.tribuo.classification.sgd.linear.LinearSGDModel\",\n",
      "  \"provenance-class\" : \"org.tribuo.provenance.ModelProvenance\",\n",
      "  \"map\" : {\n",
      "    \"instance-values\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.MapMarshalledProvenance\",\n",
      "      \"map\" : { }\n",
      "    },\n",
      "    \"tribuo-version\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"tribuo-version\",\n",
      "      \"value\" : \"4.3.0\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"java-version\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"java-version\",\n",
      "      \"value\" : \"12\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"trainer\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"trainer\",\n",
      "      \"value\" : \"logisticregressiontrainer-2\",\n",
      "      \"provenance-class\" : \"org.tribuo.provenance.impl.TrainerProvenanceImpl\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : true\n",
      "    },\n",
      "    \"os-arch\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"os-arch\",\n",
      "      \"value\" : \"amd64\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"trained-at\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"trained-at\",\n",
      "      \"value\" : \"2022-10-07T11:20:06.643297-04:00\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.DateTimeProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"os-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"os-name\",\n",
      "      \"value\" : \"Linux\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"dataset\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"dataset\",\n",
      "      \"value\" : \"mutabledataset-1\",\n",
      "      \"provenance-class\" : \"org.tribuo.provenance.DatasetProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : true\n",
      "    },\n",
      "    \"class-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"class-name\",\n",
      "      \"value\" : \"org.tribuo.classification.sgd.linear.LinearSGDModel\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    }\n",
      "  }\n",
      "}, {\n",
      "  \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ObjectMarshalledProvenance\",\n",
      "  \"object-name\" : \"mutabledataset-1\",\n",
      "  \"object-class-name\" : \"org.tribuo.MutableDataset\",\n",
      "  \"provenance-class\" : \"org.tribuo.provenance.DatasetProvenance\",\n",
      "  \"map\" : {\n",
      "    \"num-features\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"num-features\",\n",
      "      \"value\" : \"4\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.IntProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"num-examples\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"num-examples\",\n",
      "      \"value\" : \"105\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.IntProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"num-outputs\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"num-outputs\",\n",
      "      \"value\" : \"3\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.IntProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"tribuo-version\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"tribuo-version\",\n",
      "      \"value\" : \"4.3.0\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"datasource\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"datasource\",\n",
      "      \"value\" : \"traintestsplitter-3\",\n",
      "      \"provenance-class\" : \"org.tribuo.evaluation.TrainTestSplitter$SplitDataSourceProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : true\n",
      "    },\n",
      "    \"transformations\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ListMarshalledProvenance\",\n",
      "      \"list\" : [ ]\n",
      "    },\n",
      "    \"is-sequence\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"is-sequence\",\n",
      "      \"value\" : \"false\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"is-dense\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"is-dense\",\n",
      "      \"value\" : \"true\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"class-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"class-name\",\n",
      "      \"value\" : \"org.tribuo.MutableDataset\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    }\n",
      "  }\n",
      "}, {\n",
      "  \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ObjectMarshalledProvenance\",\n",
      "  \"object-name\" : \"logisticregressiontrainer-2\",\n",
      "  \"object-class-name\" : \"org.tribuo.classification.sgd.linear.LogisticRegressionTrainer\",\n",
      "  \"provenance-class\" : \"org.tribuo.provenance.impl.TrainerProvenanceImpl\",\n",
      "  \"map\" : {\n",
      "    \"seed\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"seed\",\n",
      "      \"value\" : \"12345\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.LongProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"tribuo-version\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"tribuo-version\",\n",
      "      \"value\" : \"4.3.0\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"minibatchSize\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"minibatchSize\",\n",
      "      \"value\" : \"1\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.IntProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"train-invocation-count\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"train-invocation-count\",\n",
      "      \"value\" : \"0\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.IntProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"is-sequence\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"is-sequence\",\n",
      "      \"value\" : \"false\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"shuffle\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"shuffle\",\n",
      "      \"value\" : \"true\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"epochs\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"epochs\",\n",
      "      \"value\" : \"5\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.IntProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"optimiser\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"optimiser\",\n",
      "      \"value\" : \"adagrad-4\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.ConfiguredObjectProvenanceImpl\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : true\n",
      "    },\n",
      "    \"host-short-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"host-short-name\",\n",
      "      \"value\" : \"Trainer\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"class-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"class-name\",\n",
      "      \"value\" : \"org.tribuo.classification.sgd.linear.LogisticRegressionTrainer\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"loggingInterval\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"loggingInterval\",\n",
      "      \"value\" : \"1000\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.IntProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"objective\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"objective\",\n",
      "      \"value\" : \"logmulticlass-5\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.ConfiguredObjectProvenanceImpl\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : true\n",
      "    }\n",
      "  }\n",
      "}, {\n",
      "  \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ObjectMarshalledProvenance\",\n",
      "  \"object-name\" : \"traintestsplitter-3\",\n",
      "  \"object-class-name\" : \"org.tribuo.evaluation.TrainTestSplitter\",\n",
      "  \"provenance-class\" : \"org.tribuo.evaluation.TrainTestSplitter$SplitDataSourceProvenance\",\n",
      "  \"map\" : {\n",
      "    \"train-proportion\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"train-proportion\",\n",
      "      \"value\" : \"0.7\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.DoubleProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"seed\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"seed\",\n",
      "      \"value\" : \"1\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.LongProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"size\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"size\",\n",
      "      \"value\" : \"150\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.IntProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"source\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"source\",\n",
      "      \"value\" : \"csvdatasource-6\",\n",
      "      \"provenance-class\" : \"org.tribuo.data.csv.CSVDataSource$CSVDataSourceProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : true\n",
      "    },\n",
      "    \"class-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"class-name\",\n",
      "      \"value\" : \"org.tribuo.evaluation.TrainTestSplitter\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"is-train\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"is-train\",\n",
      "      \"value\" : \"true\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    }\n",
      "  }\n",
      "}, {\n",
      "  \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ObjectMarshalledProvenance\",\n",
      "  \"object-name\" : \"adagrad-4\",\n",
      "  \"object-class-name\" : \"org.tribuo.math.optimisers.AdaGrad\",\n",
      "  \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.ConfiguredObjectProvenanceImpl\",\n",
      "  \"map\" : {\n",
      "    \"epsilon\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"epsilon\",\n",
      "      \"value\" : \"0.1\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.DoubleProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"initialLearningRate\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"initialLearningRate\",\n",
      "      \"value\" : \"1.0\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.DoubleProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"initialValue\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"initialValue\",\n",
      "      \"value\" : \"0.0\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.DoubleProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"host-short-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"host-short-name\",\n",
      "      \"value\" : \"StochasticGradientOptimiser\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"class-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"class-name\",\n",
      "      \"value\" : \"org.tribuo.math.optimisers.AdaGrad\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    }\n",
      "  }\n",
      "}, {\n",
      "  \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ObjectMarshalledProvenance\",\n",
      "  \"object-name\" : \"logmulticlass-5\",\n",
      "  \"object-class-name\" : \"org.tribuo.classification.sgd.objectives.LogMulticlass\",\n",
      "  \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.ConfiguredObjectProvenanceImpl\",\n",
      "  \"map\" : {\n",
      "    \"host-short-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"host-short-name\",\n",
      "      \"value\" : \"LabelObjective\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"class-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"class-name\",\n",
      "      \"value\" : \"org.tribuo.classification.sgd.objectives.LogMulticlass\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    }\n",
      "  }\n",
      "}, {\n",
      "  \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ObjectMarshalledProvenance\",\n",
      "  \"object-name\" : \"csvdatasource-6\",\n",
      "  \"object-class-name\" : \"org.tribuo.data.csv.CSVDataSource\",\n",
      "  \"provenance-class\" : \"org.tribuo.data.csv.CSVDataSource$CSVDataSourceProvenance\",\n",
      "  \"map\" : {\n",
      "    \"resource-hash\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"resource-hash\",\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      \"value\" : \"0FED2A99DB77EC533A62DC66894D3EC6DF3B58B6A8F3CF4A6B47E4086B7F97DC\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.HashProvenance\",\n",
      "      \"additional\" : \"SHA256\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"headers\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ListMarshalledProvenance\",\n",
      "      \"list\" : [ {\n",
      "        \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "        \"key\" : \"headers\",\n",
      "        \"value\" : \"sepalLength\",\n",
      "        \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "        \"additional\" : \"\",\n",
      "        \"is-reference\" : false\n",
      "      }, {\n",
      "        \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "        \"key\" : \"headers\",\n",
      "        \"value\" : \"sepalWidth\",\n",
      "        \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "        \"additional\" : \"\",\n",
      "        \"is-reference\" : false\n",
      "      }, {\n",
      "        \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "        \"key\" : \"headers\",\n",
      "        \"value\" : \"petalLength\",\n",
      "        \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "        \"additional\" : \"\",\n",
      "        \"is-reference\" : false\n",
      "      }, {\n",
      "        \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "        \"key\" : \"headers\",\n",
      "        \"value\" : \"petalWidth\",\n",
      "        \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "        \"additional\" : \"\",\n",
      "        \"is-reference\" : false\n",
      "      }, {\n",
      "        \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "        \"key\" : \"headers\",\n",
      "        \"value\" : \"species\",\n",
      "        \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "        \"additional\" : \"\",\n",
      "        \"is-reference\" : false\n",
      "      } ]\n",
      "    },\n",
      "    \"rowProcessor\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"rowProcessor\",\n",
      "      \"value\" : \"rowprocessor-7\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.ConfiguredObjectProvenanceImpl\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : true\n",
      "    },\n",
      "    \"file-modified-time\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"file-modified-time\",\n",
      "      \"value\" : \"1999-12-14T15:12:39-05:00\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.DateTimeProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"quote\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"quote\",\n",
      "      \"value\" : \"\\\"\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.CharProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"outputRequired\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"outputRequired\",\n",
      "      \"value\" : \"true\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"datasource-creation-time\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"datasource-creation-time\",\n",
      "      \"value\" : \"2022-10-07T11:20:06.279351-04:00\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.DateTimeProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"outputFactory\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"outputFactory\",\n",
      "      \"value\" : \"labelfactory-15\",\n",
      "      \"provenance-class\" : \"org.tribuo.classification.LabelFactory$LabelFactoryProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : true\n",
      "    },\n",
      "    \"separator\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"separator\",\n",
      "      \"value\" : \",\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.CharProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"host-short-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"host-short-name\",\n",
      "      \"value\" : \"DataSource\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"class-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"class-name\",\n",
      "      \"value\" : \"org.tribuo.data.csv.CSVDataSource\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"dataPath\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"dataPath\",\n",
      "      \"value\" : \"/local/ExternalRepositories/tribuo/tutorials/bezdekIris.data\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.FileProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    }\n",
      "  }\n",
      "}, {\n",
      "  \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ObjectMarshalledProvenance\",\n",
      "  \"object-name\" : \"rowprocessor-7\",\n",
      "  \"object-class-name\" : \"org.tribuo.data.columnar.RowProcessor\",\n",
      "  \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.ConfiguredObjectProvenanceImpl\",\n",
      "  \"map\" : {\n",
      "    \"metadataExtractors\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ListMarshalledProvenance\",\n",
      "      \"list\" : [ ]\n",
      "    },\n",
      "    \"fieldProcessorList\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ListMarshalledProvenance\",\n",
      "      \"list\" : [ {\n",
      "        \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "        \"key\" : \"fieldProcessorList\",\n",
      "        \"value\" : \"doublefieldprocessor-9\",\n",
      "        \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.ConfiguredObjectProvenanceImpl\",\n",
      "        \"additional\" : \"\",\n",
      "        \"is-reference\" : true\n",
      "      }, {\n",
      "        \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "        \"key\" : \"fieldProcessorList\",\n",
      "        \"value\" : \"doublefieldprocessor-10\",\n",
      "        \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.ConfiguredObjectProvenanceImpl\",\n",
      "        \"additional\" : \"\",\n",
      "        \"is-reference\" : true\n",
      "      }, {\n",
      "        \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "        \"key\" : \"fieldProcessorList\",\n",
      "        \"value\" : \"doublefieldprocessor-11\",\n",
      "        \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.ConfiguredObjectProvenanceImpl\",\n",
      "        \"additional\" : \"\",\n",
      "        \"is-reference\" : true\n",
      "      }, {\n",
      "        \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "        \"key\" : \"fieldProcessorList\",\n",
      "        \"value\" : \"doublefieldprocessor-12\",\n",
      "        \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.ConfiguredObjectProvenanceImpl\",\n",
      "        \"additional\" : \"\",\n",
      "        \"is-reference\" : true\n",
      "      } ]\n",
      "    },\n",
      "    \"featureProcessors\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ListMarshalledProvenance\",\n",
      "      \"list\" : [ ]\n",
      "    },\n",
      "    \"responseProcessor\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"responseProcessor\",\n",
      "      \"value\" : \"fieldresponseprocessor-13\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.ConfiguredObjectProvenanceImpl\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : true\n",
      "    },\n",
      "    \"weightExtractor\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      \"key\" : \"weightExtractor\",\n",
      "      \"value\" : \"fieldextractor-14\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.NullConfiguredProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : true\n",
      "    },\n",
      "    \"replaceNewlinesWithSpaces\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"replaceNewlinesWithSpaces\",\n",
      "      \"value\" : \"true\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"regexMappingProcessors\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.MapMarshalledProvenance\",\n",
      "      \"map\" : { }\n",
      "    },\n",
      "    \"host-short-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"host-short-name\",\n",
      "      \"value\" : \"RowProcessor\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"class-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"class-name\",\n",
      "      \"value\" : \"org.tribuo.data.columnar.RowProcessor\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    }\n",
      "  }\n",
      "}, {\n",
      "  \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ObjectMarshalledProvenance\",\n",
      "  \"object-name\" : \"labelfactory-15\",\n",
      "  \"object-class-name\" : \"org.tribuo.classification.LabelFactory\",\n",
      "  \"provenance-class\" : \"org.tribuo.classification.LabelFactory$LabelFactoryProvenance\",\n",
      "  \"map\" : {\n",
      "    \"class-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"class-name\",\n",
      "      \"value\" : \"org.tribuo.classification.LabelFactory\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    }\n",
      "  }\n",
      "}, {\n",
      "  \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ObjectMarshalledProvenance\",\n",
      "  \"object-name\" : \"doublefieldprocessor-9\",\n",
      "  \"object-class-name\" : \"org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\",\n",
      "  \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.ConfiguredObjectProvenanceImpl\",\n",
      "  \"map\" : {\n",
      "    \"fieldName\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"fieldName\",\n",
      "      \"value\" : \"petalLength\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"onlyFieldName\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"onlyFieldName\",\n",
      "      \"value\" : \"true\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"throwOnInvalid\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"throwOnInvalid\",\n",
      "      \"value\" : \"true\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"host-short-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"host-short-name\",\n",
      "      \"value\" : \"FieldProcessor\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"class-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"class-name\",\n",
      "      \"value\" : \"org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    }\n",
      "  }\n",
      "}, {\n",
      "  \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ObjectMarshalledProvenance\",\n",
      "  \"object-name\" : \"doublefieldprocessor-10\",\n",
      "  \"object-class-name\" : \"org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\",\n",
      "  \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.ConfiguredObjectProvenanceImpl\",\n",
      "  \"map\" : {\n",
      "    \"fieldName\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"fieldName\",\n",
      "      \"value\" : \"petalWidth\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"onlyFieldName\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"onlyFieldName\",\n",
      "      \"value\" : \"true\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"throwOnInvalid\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"throwOnInvalid\",\n",
      "      \"value\" : \"true\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"host-short-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"host-short-name\",\n",
      "      \"value\" : \"FieldProcessor\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"class-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"class-name\",\n",
      "      \"value\" : \"org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    }\n",
      "  }\n",
      "}, {\n",
      "  \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ObjectMarshalledProvenance\",\n",
      "  \"object-name\" : \"doublefieldprocessor-11\",\n",
      "  \"object-class-name\" : \"org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\",\n",
      "  \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.ConfiguredObjectProvenanceImpl\",\n",
      "  \"map\" : {\n",
      "    \"fieldName\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"fieldName\",\n",
      "      \"value\" : \"sepalWidth\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"onlyFieldName\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"onlyFieldName\",\n",
      "      \"value\" : \"true\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"throwOnInvalid\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"throwOnInvalid\",\n",
      "      \"value\" : \"true\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"host-short-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"host-short-name\",\n",
      "      \"value\" : \"FieldProcessor\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"class-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"class-name\",\n",
      "      \"value\" : \"org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    }\n",
      "  }\n",
      "}, {\n",
      "  \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ObjectMarshalledProvenance\",\n",
      "  \"object-name\" : \"doublefieldprocessor-12\",\n",
      "  \"object-class-name\" : \"org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\",\n",
      "  \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.ConfiguredObjectProvenanceImpl\",\n",
      "  \"map\" : {\n",
      "    \"fieldName\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"fieldName\",\n",
      "      \"value\" : \"sepalLength\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"onlyFieldName\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"onlyFieldName\",\n",
      "      \"value\" : \"true\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"throwOnInvalid\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"throwOnInvalid\",\n",
      "      \"value\" : \"true\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"host-short-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"host-short-name\",\n",
      "      \"value\" : \"FieldProcessor\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"class-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"class-name\",\n",
      "      \"value\" : \"org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    }\n",
      "  }\n",
      "}, {\n",
      "  \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ObjectMarshalledProvenance\",\n",
      "  \"object-name\" : \"fieldresponseprocessor-13\",\n",
      "  \"object-class-name\" : \"org.tribuo.data.columnar.processors.response.FieldResponseProcessor\",\n",
      "  \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.ConfiguredObjectProvenanceImpl\",\n",
      "  \"map\" : {\n",
      "    \"uppercase\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"uppercase\",\n",
      "      \"value\" : \"false\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"fieldNames\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ListMarshalledProvenance\",\n",
      "      \"list\" : [ {\n",
      "        \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "        \"key\" : \"fieldNames\",\n",
      "        \"value\" : \"species\",\n",
      "        \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "        \"additional\" : \"\",\n",
      "        \"is-reference\" : false\n",
      "      } ]\n",
      "    },\n",
      "    \"defaultValues\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ListMarshalledProvenance\",\n",
      "      \"list\" : [ {\n",
      "        \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "        \"key\" : \"defaultValues\",\n",
      "        \"value\" : \"\",\n",
      "        \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "        \"additional\" : \"\",\n",
      "        \"is-reference\" : false\n",
      "      } ]\n",
      "    },\n",
      "    \"displayField\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"displayField\",\n",
      "      \"value\" : \"false\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.BooleanProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"outputFactory\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"outputFactory\",\n",
      "      \"value\" : \"labelfactory-15\",\n",
      "      \"provenance-class\" : \"org.tribuo.classification.LabelFactory$LabelFactoryProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : true\n",
      "    },\n",
      "    \"host-short-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"host-short-name\",\n",
      "      \"value\" : \"ResponseProcessor\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    },\n",
      "    \"class-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"class-name\",\n",
      "      \"value\" : \"org.tribuo.data.columnar.processors.response.FieldResponseProcessor\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    }\n",
      "  }\n",
      "}, {\n",
      "  \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.ObjectMarshalledProvenance\",\n",
      "  \"object-name\" : \"fieldextractor-14\",\n",
      "  \"object-class-name\" : \"org.tribuo.data.columnar.FieldExtractor\",\n",
      "  \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.impl.NullConfiguredProvenance\",\n",
      "  \"map\" : {\n",
      "    \"class-name\" : {\n",
      "      \"marshalled-class\" : \"com.oracle.labs.mlrg.olcut.provenance.io.SimpleMarshalledProvenance\",\n",
      "      \"key\" : \"class-name\",\n",
      "      \"value\" : \"org.tribuo.data.columnar.FieldExtractor\",\n",
      "      \"provenance-class\" : \"com.oracle.labs.mlrg.olcut.provenance.primitives.StringProvenance\",\n",
      "      \"additional\" : \"\",\n",
      "      \"is-reference\" : false\n",
      "    }\n",
      "  }\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "} ]\n"
     ]
    }
   ],
   "source": [
    "String jsonProvenance = objMapper.writeValueAsString(ProvenanceUtil.marshalProvenance(provenance));\n",
    "System.out.println(jsonProvenance);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively the model provenance is also present in the output of `Model.toString()`, though this format is not machine readable (or particularly human readable for that matter)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "linear-sgd-model - Model(class-name=org.tribuo.classification.sgd.linear.LinearSGDModel,dataset=Dataset(class-name=org.tribuo.MutableDataset,datasource=SplitDataSourceProvenance(className=org.tribuo.evaluation.TrainTestSplitter,innerSourceProvenance=DataSource(class-name=org.tribuo.data.csv.CSVDataSource,headers=[sepalLength, sepalWidth, petalLength, petalWidth, species],rowProcessor=RowProcessor(class-name=org.tribuo.data.columnar.RowProcessor,metadataExtractors=[],fieldProcessorList=[FieldProcessor(class-name=org.tribuo.data.columnar.processors.field.DoubleFieldProcessor,fieldName=petalLength,onlyFieldName=true,throwOnInvalid=true,host-short-name=FieldProcessor), FieldProcessor(class-name=org.tribuo.data.columnar.processors.field.DoubleFieldProcessor,fieldName=petalWidth,onlyFieldName=true,throwOnInvalid=true,host-short-name=FieldProcessor), FieldProcessor(class-name=org.tribuo.data.columnar.processors.field.DoubleFieldProcessor,fieldName=sepalWidth,onlyFieldName=true,throwOnInvalid=true,host-short-name=FieldProcessor), FieldProcessor(class-name=org.tribuo.data.columnar.processors.field.DoubleFieldProcessor,fieldName=sepalLength,onlyFieldName=true,throwOnInvalid=true,host-short-name=FieldProcessor)],featureProcessors=[],responseProcessor=ResponseProcessor(class-name=org.tribuo.data.columnar.processors.response.FieldResponseProcessor,uppercase=false,fieldNames=[species],defaultValues=[],displayField=false,outputFactory=OutputFactory(class-name=org.tribuo.classification.LabelFactory),host-short-name=ResponseProcessor),weightExtractor=null,replaceNewlinesWithSpaces=true,regexMappingProcessors={},host-short-name=RowProcessor),quote=\",outputRequired=true,outputFactory=OutputFactory(class-name=org.tribuo.classification.LabelFactory),separator=,,dataPath=/local/ExternalRepositories/tribuo/tutorials/bezdekIris.data,resource-hash=SHA-256[0FED2A99DB77EC533A62DC66894D3EC6DF3B58B6A8F3CF4A6B47E4086B7F97DC],file-modified-time=1999-12-14T15:12:39-05:00,datasource-creation-time=2022-10-07T11:20:06.279351-04:00,host-short-name=DataSource),trainProportion=0.7,seed=1,size=150,isTrain=true),transformations=[],is-sequence=false,is-dense=true,num-examples=105,num-features=4,num-outputs=3,tribuo-version=4.3.0),trainer=Trainer(class-name=org.tribuo.classification.sgd.linear.LogisticRegressionTrainer,seed=12345,minibatchSize=1,shuffle=true,epochs=5,optimiser=StochasticGradientOptimiser(class-name=org.tribuo.math.optimisers.AdaGrad,epsilon=0.1,initialLearningRate=1.0,initialValue=0.0,host-short-name=StochasticGradientOptimiser),loggingInterval=1000,objective=LabelObjective(class-name=org.tribuo.classification.sgd.objectives.LogMulticlass,host-short-name=LabelObjective),tribuo-version=4.3.0,train-invocation-count=0,is-sequence=false,host-short-name=Trainer),trained-at=2022-10-07T11:20:06.643297-04:00,instance-values={},tribuo-version=4.3.0,java-version=12,os-name=Linux,os-arch=amd64)\n"
     ]
    }
   ],
   "source": [
    "System.out.println(irisModel.toString());"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Evaluations also have a provenance that records the model provenance along with the test data provenance. We're using an alternate form of the JSON provenance that's easier to read, though a little less precise. This form is suitable for reference but can't be used to reconstruct the original provenance object as it's converted everything into Strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"tribuo-version\" : \"4.3.0\",\n",
      "  \"dataset-provenance\" : {\n",
      "    \"num-features\" : \"4\",\n",
      "    \"num-examples\" : \"45\",\n",
      "    \"num-outputs\" : \"3\",\n",
      "    \"tribuo-version\" : \"4.3.0\",\n",
      "    \"datasource\" : {\n",
      "      \"train-proportion\" : \"0.7\",\n",
      "      \"seed\" : \"1\",\n",
      "      \"size\" : \"150\",\n",
      "      \"source\" : {\n",
      "        \"resource-hash\" : \"0FED2A99DB77EC533A62DC66894D3EC6DF3B58B6A8F3CF4A6B47E4086B7F97DC\",\n",
      "        \"headers\" : [ \"sepalLength\", \"sepalWidth\", \"petalLength\", \"petalWidth\", \"species\" ],\n",
      "        \"rowProcessor\" : {\n",
      "          \"metadataExtractors\" : [ ],\n",
      "          \"fieldProcessorList\" : [ {\n",
      "            \"fieldName\" : \"petalLength\",\n",
      "            \"onlyFieldName\" : \"true\",\n",
      "            \"throwOnInvalid\" : \"true\",\n",
      "            \"host-short-name\" : \"FieldProcessor\",\n",
      "            \"class-name\" : \"org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\"\n",
      "          }, {\n",
      "            \"fieldName\" : \"petalWidth\",\n",
      "            \"onlyFieldName\" : \"true\",\n",
      "            \"throwOnInvalid\" : \"true\",\n",
      "            \"host-short-name\" : \"FieldProcessor\",\n",
      "            \"class-name\" : \"org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\"\n",
      "          }, {\n",
      "            \"fieldName\" : \"sepalWidth\",\n",
      "            \"onlyFieldName\" : \"true\",\n",
      "            \"throwOnInvalid\" : \"true\",\n",
      "            \"host-short-name\" : \"FieldProcessor\",\n",
      "            \"class-name\" : \"org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\"\n",
      "          }, {\n",
      "            \"fieldName\" : \"sepalLength\",\n",
      "            \"onlyFieldName\" : \"true\",\n",
      "            \"throwOnInvalid\" : \"true\",\n",
      "            \"host-short-name\" : \"FieldProcessor\",\n",
      "            \"class-name\" : \"org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\"\n",
      "          } ],\n",
      "          \"featureProcessors\" : [ ],\n",
      "          \"responseProcessor\" : {\n",
      "            \"uppercase\" : \"false\",\n",
      "            \"fieldNames\" : [ \"species\" ],\n",
      "            \"defaultValues\" : [ \"\" ],\n",
      "            \"displayField\" : \"false\",\n",
      "            \"outputFactory\" : {\n",
      "              \"class-name\" : \"org.tribuo.classification.LabelFactory\"\n",
      "            },\n",
      "            \"host-short-name\" : \"ResponseProcessor\",\n",
      "            \"class-name\" : \"org.tribuo.data.columnar.processors.response.FieldResponseProcessor\"\n",
      "          },\n",
      "          \"weightExtractor\" : {\n",
      "            \"class-name\" : \"org.tribuo.data.columnar.FieldExtractor\"\n",
      "          },\n",
      "          \"replaceNewlinesWithSpaces\" : \"true\",\n",
      "          \"regexMappingProcessors\" : { },\n",
      "          \"host-short-name\" : \"RowProcessor\",\n",
      "          \"class-name\" : \"org.tribuo.data.columnar.RowProcessor\"\n",
      "        },\n",
      "        \"file-modified-time\" : \"1999-12-14T15:12:39-05:00\",\n",
      "        \"quote\" : \"\\\"\",\n",
      "        \"outputRequired\" : \"true\",\n",
      "        \"datasource-creation-time\" : \"2022-10-07T11:20:06.279351-04:00\",\n",
      "        \"outputFactory\" : {\n",
      "          \"class-name\" : \"org.tribuo.classification.LabelFactory\"\n",
      "        },\n",
      "        \"separator\" : \",\",\n",
      "        \"host-short-name\" : \"DataSource\",\n",
      "        \"class-name\" : \"org.tribuo.data.csv.CSVDataSource\",\n",
      "        \"dataPath\" : \"/local/ExternalRepositories/tribuo/tutorials/bezdekIris.data\"\n",
      "      },\n",
      "      \"class-name\" : \"org.tribuo.evaluation.TrainTestSplitter\",\n",
      "      \"is-train\" : \"false\"\n",
      "    },\n",
      "    \"transformations\" : [ ],\n",
      "    \"is-sequence\" : \"false\",\n",
      "    \"is-dense\" : \"true\",\n",
      "    \"class-name\" : \"org.tribuo.MutableDataset\"\n",
      "  },\n",
      "  \"class-name\" : \"org.tribuo.provenance.EvaluationProvenance\",\n",
      "  \"model-provenance\" : {\n",
      "    \"instance-values\" : { },\n",
      "    \"tribuo-version\" : \"4.3.0\",\n",
      "    \"java-version\" : \"12\",\n",
      "    \"trainer\" : {\n",
      "      \"seed\" : \"12345\",\n",
      "      \"tribuo-version\" : \"4.3.0\",\n",
      "      \"minibatchSize\" : \"1\",\n",
      "      \"train-invocation-count\" : \"0\",\n",
      "      \"is-sequence\" : \"false\",\n",
      "      \"shuffle\" : \"true\",\n",
      "      \"epochs\" : \"5\",\n",
      "      \"optimiser\" : {\n",
      "        \"epsilon\" : \"0.1\",\n",
      "        \"initialLearningRate\" : \"1.0\",\n",
      "        \"initialValue\" : \"0.0\",\n",
      "        \"host-short-name\" : \"StochasticGradientOptimiser\",\n",
      "        \"class-name\" : \"org.tribuo.math.optimisers.AdaGrad\"\n",
      "      },\n",
      "      \"host-short-name\" : \"Trainer\",\n",
      "      \"class-name\" : \"org.tribuo.classification.sgd.linear.LogisticRegressionTrainer\",\n",
      "      \"loggingInterval\" : \"1000\",\n",
      "      \"objective\" : {\n",
      "        \"host-short-name\" : \"LabelObjective\",\n",
      "        \"class-name\" : \"org.tribuo.classification.sgd.objectives.LogMulticlass\"\n",
      "      }\n",
      "    },\n",
      "    \"os-arch\" : \"amd64\",\n",
      "    \"trained-at\" : \"2022-10-07T11:20:06.643297-04:00\",\n",
      "    \"os-name\" : \"Linux\",\n",
      "    \"dataset\" : {\n",
      "      \"num-features\" : \"4\",\n",
      "      \"num-examples\" : \"105\",\n",
      "      \"num-outputs\" : \"3\",\n",
      "      \"tribuo-version\" : \"4.3.0\",\n",
      "      \"datasource\" : {\n",
      "        \"train-proportion\" : \"0.7\",\n",
      "        \"seed\" : \"1\",\n",
      "        \"size\" : \"150\",\n",
      "        \"source\" : {\n",
      "          \"resource-hash\" : \"0FED2A99DB77EC533A62DC66894D3EC6DF3B58B6A8F3CF4A6B47E4086B7F97DC\",\n",
      "          \"headers\" : [ \"sepalLength\", \"sepalWidth\", \"petalLength\", \"petalWidth\", \"species\" ],\n",
      "          \"rowProcessor\" : {\n",
      "            \"metadataExtractors\" : [ ],\n",
      "            \"fieldProcessorList\" : [ {\n",
      "              \"fieldName\" : \"petalLength\",\n",
      "              \"onlyFieldName\" : \"true\",\n",
      "              \"throwOnInvalid\" : \"true\",\n",
      "              \"host-short-name\" : \"FieldProcessor\",\n",
      "              \"class-name\" : \"org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\"\n",
      "            }, {\n",
      "              \"fieldName\" : \"petalWidth\",\n",
      "              \"onlyFieldName\" : \"true\",\n",
      "              \"throwOnInvalid\" : \"true\",\n",
      "              \"host-short-name\" : \"FieldProcessor\",\n",
      "              \"class-name\" : \"org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\"\n",
      "            }, {\n",
      "              \"fieldName\" : \"sepalWidth\",\n",
      "              \"onlyFieldName\" : \"true\",\n",
      "              \"throwOnInvalid\" : \"true\",\n",
      "              \"host-short-name\" : \"FieldProcessor\",\n",
      "              \"class-name\" : \"org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\"\n",
      "            }, {\n",
      "              \"fieldName\" : \"sepalLength\",\n",
      "              \"onlyFieldName\" : \"true\",\n",
      "              \"throwOnInvalid\" : \"true\",\n",
      "              \"host-short-name\" : \"FieldProcessor\",\n",
      "              \"class-name\" : \"org.tribuo.data.columnar.processors.field.DoubleFieldProcessor\"\n",
      "            } ],\n",
      "            \"featureProcessors\" : [ ],\n",
      "            \"responseProcessor\" : {\n",
      "              \"uppercase\" : \"false\",\n",
      "              \"fieldNames\" : [ \"species\" ],\n",
      "              \"defaultValues\" : [ \"\" ],\n",
      "              \"displayField\" : \"false\",\n",
      "              \"outputFactory\" : {\n",
      "                \"class-name\" : \"org.tribuo.classification.LabelFactory\"\n",
      "              },\n",
      "              \"host-short-name\" : \"ResponseProcessor\",\n",
      "              \"class-name\" : \"org.tribuo.data.columnar.processors.response.FieldResponseProcessor\"\n",
      "            },\n",
      "            \"weightExtractor\" : {\n",
      "              \"class-name\" : \"org.tribuo.data.columnar.FieldExtractor\"\n",
      "            },\n",
      "            \"replaceNewlinesWithSpaces\" : \"true\",\n",
      "            \"regexMappingProcessors\" : { },\n",
      "            \"host-short-name\" : \"RowProcessor\",\n",
      "            \"class-name\" : \"org.tribuo.data.columnar.RowProcessor\"\n",
      "          },\n",
      "          \"file-modified-time\" : \"1999-12-14T15:12:39-05:00\",\n",
      "          \"quote\" : \"\\\"\",\n",
      "          \"outputRequired\" : \"true\",\n",
      "          \"datasource-creation-time\" : \"2022-10-07T11:20:06.279351-04:00\",\n",
      "          \"outputFactory\" : {\n",
      "            \"class-name\" : \"org.tribuo.classification.LabelFactory\"\n",
      "          },\n",
      "          \"separator\" : \",\",\n",
      "          \"host-short-name\" : \"DataSource\",\n",
      "          \"class-name\" : \"org.tribuo.data.csv.CSVDataSource\",\n",
      "          \"dataPath\" : \"/local/ExternalRepositories/tribuo/tutorials/bezdekIris.data\"\n",
      "        },\n",
      "        \"class-name\" : \"org.tribuo.evaluation.TrainTestSplitter\",\n",
      "        \"is-train\" : \"true\"\n",
      "      },\n",
      "      \"transformations\" : [ ],\n",
      "      \"is-sequence\" : \"false\",\n",
      "      \"is-dense\" : \"true\",\n",
      "      \"class-name\" : \"org.tribuo.MutableDataset\"\n",
      "    },\n",
      "    \"class-name\" : \"org.tribuo.classification.sgd.linear.LinearSGDModel\"\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "String jsonEvaluationProvenance = objMapper.writeValueAsString(ProvenanceUtil.convertToMap(evaluation.getProvenance()));\n",
    "System.out.println(jsonEvaluationProvenance);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that this provenance includes all the fields from the models' provenance, along with the test data, it's split, and the CSV it came from.\n",
    "\n",
    "This provenance information is useful on it's own for tracking models, but when combined with the config system described in the configuration tutorial it becomes a powerful way of rebuilding models and experiments, allowing near perfect replicability of any ML model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading and saving models\n",
    "In Tribuo 4.3 there are two methods for loading and saving models. The old Java serialization support is deprecated in 4.3, and new support for serializing models and other Tribuo classes to protobufs has been added. In the next major version we'll drop support for Java serialization and solely use the protobuf serialization support, and models from Tribuo 4.3 stored as protobufs will be loadable in Tribuo 5. Serializable types in Tribuo now implement two interfaces, `java.io.Serializable` and `org.tribuo.protos.ProtoSerializable`. They can be written to input and output streams in either format, but `Model`, `Dataset`, `SequenceModel` and `SequenceDataset` have all gained additional helpers for serializing and deserializing objects to protobuf files or streams.\n",
    "\n",
    "### Java serialization\n",
    "Here we'll go through saving and loading the model we just trained, but the procedure is the same for all other Tribuo models. We're going to save this out into the tutorials directory as this model file is used in the reproducibility tutorial.\n",
    "\n",
    "First we save the model out using an `ObjectOutputStream`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "File tmpFile = new File(\"iris-lr-model.ser\");\n",
    "try (ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(tmpFile))) {\n",
    "    oos.writeObject(irisModel);\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can load in the saved model. We're going to use the serialization allow list that comes with Tribuo, to ensure we only load in Tribuo related classes (this is described in [JEP 290](https://openjdk.java.net/jeps/290)). This feature is available in Java 9 onwards, and as a process wide feature in Java 8 from 8u121. Usually the pattern would be stored in code or as a classpath resource, here we're going to read it out of the Tribuo repository (assuming this notebook is in `Tribuo/tutorials`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "String filterPattern = Files.readAllLines(Paths.get(\"../docs/jep-290-filter.txt\")).get(0);\n",
    "ObjectInputFilter filter = ObjectInputFilter.Config.createFilter(filterPattern);\n",
    "Model<?> loadedModel;\n",
    "try (ObjectInputStream ois = new ObjectInputStream(new BufferedInputStream(new FileInputStream(tmpFile)))) {\n",
    "    ois.setObjectInputFilter(filter);\n",
    "    loadedModel = (Model<?>) ois.readObject();\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Protobuf serialization\n",
    "\n",
    "Now we'll use the new protobuf support to save out the model, and then to load it back in."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "// First save the model\n",
    "Path outputPath = Paths.get(\"iris-lr-model.pb\");\n",
    "try {\n",
    "    irisModel.serializeToFile(outputPath);\n",
    "} catch (IOException e) { System.out.println(\"Exception when writing - \" + e); }\n",
    "\n",
    "// Then load it back in\n",
    "Model<?> loadedPBModel = Model.deserializeFromFile(outputPath);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As Tribuo's models are generically typed, and Java's generics are erased, models are loaded back in from either format with a wildcard type which needs to be cast to the correct `Output` subclass. Tribuo has a mechanism for validating that the type is correct, `model.validate(Class<? extends Output<?>>)` which returns true if the supplied class is the same as the internal output type stored in this model. There's also `model.castModel(Class<U extends Output<U>>)` which wraps up the validate check and either casts the model appropriately or throws `ClassCastException` if the type is invalid."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "It's a Model<Label>!\n"
     ]
    }
   ],
   "source": [
    "if (loadedModel.validate(Label.class)) {\n",
    "    System.out.println(\"It's a Model<Label>!\");\n",
    "} else {\n",
    "    System.out.println(\"It's some other kind of Model.\");\n",
    "}\n",
    "\n",
    "Model<Label> model = loadedModel.castModel(Label.class);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can use this check to guard a cast to the appropriate generic type before using the model as normal.\n",
    "\n",
    "We'll check that the models are the same by comparing their provenances."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "true"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "loadedModel.getProvenance().equals(irisModel.getProvenance())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "We looked at Tribuo's csv loading mechanism, how to train a simple classifier, how to evaluate a classifier on test data, what metadata and provenance information is stored inside Tribuo's `Model` and `Evaluation` objects, and finally how to save and load Tribuo's models."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Java",
   "language": "java",
   "name": "java"
  },
  "language_info": {
   "codemirror_mode": "java",
   "file_extension": ".jshell",
   "mimetype": "text/x-java-source",
   "name": "Java",
   "pygments_lexer": "java",
   "version": "12+33"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
